Fill bad time intervals with fake data #782

matteobachetti · 2023-12-08T16:32:51Z

~~Depends on #754~~

Changes can be seen in action in StingraySoftware/notebooks#76

Also, resolve #612

pep8speaks · 2023-12-08T16:33:10Z

Hello @matteobachetti! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file stingray/base.py:

Line 2254:81: E203 whitespace before ':'
Line 2255:60: E203 whitespace before ':'
Line 2266:81: E203 whitespace before ':'
Line 2267:60: E203 whitespace before ':'

Comment last updated at 2024-01-11 09:46:05 UTC

codecov · 2023-12-08T16:35:46Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (490a7c7) 96.31% compared to head (de2ee3e) 96.33%.
Report is 5 commits behind head on main.

Files	Patch %	Lines
stingray/lightcurve.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #782      +/-   ##
==========================================
+ Coverage   96.31%   96.33%   +0.02%     
==========================================
  Files          43       43              
  Lines        8497     8548      +51     
==========================================
+ Hits         8184     8235      +51     
  Misses        313      313

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mgullik

Hi @matteobachetti,
this is a great additional tool for the software.
A few things:
In a lightcurve-like object, the filling-the-gap routine is easier to understand. We know the time resolution of the lightcurve and the routine creates time bins with a value of count / countrate.

In a StingrayTimeseries-like object, it is more complicated, how do you choose the arrival times that are used to fill the gap? From the docstring I read
"Random data are extracted by randomly repeating the values of nearby good data"
This is not extremely clear. If you repeat the values of the arrival time, you don't fill the empty gaps, do you?

More importantly, we should find a way to test that using the function fill_bad_time_intervals on a StingrayTimeseries and then making a lightcurve is equivalent to using the function fill_bad_time_intervals on the lightcurve made from the original StingrayTimeseries (with the gaps). I guess the two cases can't lead to the same numbers, because of the random function, but they should be close enough.

In a lightcurve-like object

what happens if the gaps are smaller than the dt?
can I fill only certain gaps and not others? For example, I want to fill the gaps in the first half of the light curve and not in the second half
can we include the possibility of filling the gap with the linear interpolation between the two edges of GTIs adjacent to the BTI? Does it make sense?

Additional tests:

test multiple gaps and not only one gap to fill
some specific comments in the code

stingray/tests/test_base.py

mgullik · 2024-01-03T13:48:40Z

stingray/tests/test_base.py

+        ev_like_filt.gti = np.asarray([[0, 498], [500, 900], [950, 1000]])
+        ev_new = ev_like_filt.fill_bad_time_intervals()
+
+        assert np.allclose(ev_new.gti, self.gti)


In this case, what happens to the time array?
In principle, ev_new.time != ev_like_filt.time, because the BTI has been filled with random arrival times. right?

stingray/tests/test_base.py

matteobachetti · 2024-01-04T10:50:35Z

@mgullik thanks for the thorough review, which I tried to cover in my new changes. There are a few remaining questions which I haven't answered yet, that probably need some discussion:

a lightcurve-like object, the filling-the-gap routine is easier to understand. We know the time resolution of the lightcurve and the routine creates time bins with a value of count / countrate.

In a StingrayTimeseries-like object, it is more complicated, how do you choose the arrival times that are used to fill the gap? From the docstring I read
"Random data are extracted by randomly repeating the values of nearby good data"
This is not extremely clear. If you repeat the values of the arrival time, you don't fill the empty gaps, do you?

I improved the docstring, explaining how uniformly and non-uniformly sampled data are treated differently. Basically, the only change is that times are assigned on a fixed grid for uniformly sampled, and randomized with the same countrate as in the buffer for non-uniformly sampled.

More importantly, we should find a way to test that using the function fill_bad_time_intervals on a StingrayTimeseries and then making a lightcurve is equivalent to using the function fill_bad_time_intervals on the lightcurve made from the original StingrayTimeseries (with the gaps). I guess the two cases can't lead to the same numbers, because of the random function, but they should be close enough.

Why do you think this is important? A uniformly sampled time series should behave just like a light curve, and tests in both time series and light curves all pass independently.

In a lightcurve-like object

what happens if the gaps are smaller than the dt?

The light curve machinery should cover this: bins partially outside GTIs are just treated as if they were outside GTIs.

can I fill only certain gaps and not others? For example, I want to fill the gaps in the first half of the light curve and not in the second half

I don't see a use case for this, what would be the application? I guess one could split the light curve, fill the GTIs in the first half, and then join the two chunks back together.

can we include the possibility of filling the gap with the linear interpolation between the two edges of GTIs adjacent to the BTI? Does it make sense?

We could. Again, what would be the use case? While using random data tries to preserve the statistical properties of the data set, the linear interpolation would knowingly alter that.

Additional tests:

test multiple gaps and not only one gap to fill

Done

some specific comments in the code

I think I addressed those one by one

dhuppenkothen

Looking good! some comments and thoughts on the limits of what I can do, that I think should probably be written down somewhere, but not necessarily here (maybe in a tutorial?)

stingray/base.py

dhuppenkothen · 2024-01-10T14:32:05Z

stingray/base.py

+        ----------------
+        max_length : float
+            Maximum length of a bad time interval to be filled. If None, the criterion is bad
+            time intervals shorter than 1/100th of the longest bad time interval.


Where does 1/100 come from? That seems maybe a bit arbitrary?

It's actually 1% of the longest good time interval. It's just a small length, by default, so that we don't alter the statistical properties of the data too much

stingray/base.py

…ted and evenly sampled

dhuppenkothen

LGTM! 👍 Assuming @mgullik is happy with it, too, this can get merged, I think?

matteobachetti marked this pull request as draft December 8, 2023 16:32

matteobachetti force-pushed the fill_btis_with_fake_data branch from 9357c6a to b14a586 Compare December 15, 2023 13:36

matteobachetti marked this pull request as ready for review December 15, 2023 13:38

matteobachetti requested review from dhuppenkothen and mgullik December 15, 2023 13:40

matteobachetti force-pushed the fill_btis_with_fake_data branch 2 times, most recently from ac336a8 to 7ad8a61 Compare December 31, 2023 13:17

mgullik reviewed Jan 3, 2024

View reviewed changes

matteobachetti added 14 commits January 3, 2024 16:17

Add side parameter to find_nearest

76b5e56

Add function to randomize data in small bad time intervals

5b50e65

Add changelog

f74b9cc

Pay attention to edges

b91a8d3

Rename options

3e6e4b4

Fix syntax

b2e9817

Test that only the attributes we want are randomized

1352e9d

Test case with no btis

4d444d4

Add fake data to docs

9c52405

Fix plotting

2b2eebb

Fix deprecation in plot

fc1e77c

Fix deprecation in plot

988dc30

Fix corner cases and warnings

7447456

Improve docstring

11cf78e

matteobachetti force-pushed the fill_btis_with_fake_data branch from 7bf79e7 to 11cf78e Compare January 3, 2024 15:53

Test more keywords and more bad intervals

11c01d8

dhuppenkothen requested changes Jan 10, 2024

View reviewed changes

matteobachetti added 2 commits January 11, 2024 09:16

Top-level import when appropriate

32b6570

Add warning about applying the technique only to *very* short gaps

47d99d7

matteobachetti added 5 commits January 11, 2024 09:32

Change uniform to even, to avoid confusion between uniformly distribu…

973fcc8

…ted and evenly sampled

Cleanup

8133062

Further warning

8f8243d

Fix docstring

aac044b

Change buffer default behavior

ee47f7e

matteobachetti force-pushed the fill_btis_with_fake_data branch from 88468da to 8248a04 Compare January 11, 2024 09:34

matteobachetti added 2 commits January 11, 2024 10:45

Improve compatibility with Numpy 2.0

51a3369

Fix rst issue in docstring

de2ee3e

matteobachetti force-pushed the fill_btis_with_fake_data branch from 8248a04 to de2ee3e Compare January 11, 2024 09:45

matteobachetti requested a review from dhuppenkothen January 11, 2024 09:51

dhuppenkothen approved these changes Jan 11, 2024

View reviewed changes

mgullik approved these changes Jan 11, 2024

View reviewed changes

matteobachetti added this pull request to the merge queue Jan 11, 2024

Merged via the queue into main with commit 33647b4 Jan 11, 2024
16 checks passed

matteobachetti deleted the fill_btis_with_fake_data branch January 29, 2024 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fill bad time intervals with fake data #782

Fill bad time intervals with fake data #782

matteobachetti commented Dec 8, 2023 •

edited

Loading

pep8speaks commented Dec 8, 2023 •

edited

Loading

codecov bot commented Dec 8, 2023 •

edited

Loading

mgullik left a comment

mgullik Jan 3, 2024

matteobachetti Jan 3, 2024

matteobachetti commented Jan 4, 2024

dhuppenkothen left a comment

dhuppenkothen Jan 10, 2024

matteobachetti Jan 11, 2024

dhuppenkothen left a comment

Fill bad time intervals with fake data #782

Fill bad time intervals with fake data #782

Conversation

matteobachetti commented Dec 8, 2023 • edited Loading

pep8speaks commented Dec 8, 2023 • edited Loading

Comment last updated at 2024-01-11 09:46:05 UTC

codecov bot commented Dec 8, 2023 • edited Loading

Codecov Report

mgullik left a comment

Choose a reason for hiding this comment

mgullik Jan 3, 2024

Choose a reason for hiding this comment

matteobachetti Jan 3, 2024

Choose a reason for hiding this comment

matteobachetti commented Jan 4, 2024

dhuppenkothen left a comment

Choose a reason for hiding this comment

dhuppenkothen Jan 10, 2024

Choose a reason for hiding this comment

matteobachetti Jan 11, 2024

Choose a reason for hiding this comment

dhuppenkothen left a comment

Choose a reason for hiding this comment

matteobachetti commented Dec 8, 2023 •

edited

Loading

pep8speaks commented Dec 8, 2023 •

edited

Loading

codecov bot commented Dec 8, 2023 •

edited

Loading